Exercise 15.1
Fill in the blanks in each of the following:
a) A self-________ class is used to form dynamic data structures. that can
grow and shrink at execution time
b) Operator ________ is used to dynamically allocate memory; this operator
returns a pointer to the allocated memory.
c) A ________ is a constrained version of a linked list in which nodes can be
inserted and deleted only from the start of the list; this data structure returns
node values in last-in-first-out order.
d) A function that does not alter a linked list, but simply looks at the list to
determine if it is empty is referred to as a ________function.
e) A queue is referred to as a ________ data structure because the first nodes
inserted are the first nodes removed.
f) The pointer to the next node in a linked list is referred to as a ________.
g) Operator ________ is used to reclaim dynamically allocated memory.
h) A ________ is a constrained version of a linked list in which nodes can be
inserted only at the end of the list and deleted only from the start of the list.
i) A ________ is a nonlinear, two-dimensional data structure that contains
nodes with two or more links.
j) A stack is referred to as a ________ data structure because the last node
inserted is the first node removed.
k) The nodes of a ________ tree contain two link members.
l) The first node of a tree is the ________ node.
m) Each link in a tree node points to a ________ or ________ of that node.
n) A tree node that has no children is called a ________ node.
o) The four traversal algorithms we mentioned in the text for binary search trees
are ______, ______, _______ and ______.
Exercise 15.2
What are the differences between a linked list and a stack?
Exercise 15.3
What are the differences between a stack and a queue?
Exercise 15.4
Perhaps a more appropriate title for this chapter would have been "Reusable
Data Structures." Comment on how each of the following entities or concepts
contributes to the reusability of data structures:
a) classes
b) template classes
c) inheritance
d) private inheritance
e) composition.
Exercise 15.5
Manually provide the inorder, preorder, and postorder traversals of the binary
search tree of
Fig. 15.19.
Exercise 15.6
Write a program that concatenates two linked list objects of characters. The
program should include function concatenate that takes references to both list
objects as arguments and concatenates the second list to the first list.
Exercise 15.7
Write a program that merges two ordered list objects of integers into a single
ordered list object of integers. Function merge should receive references to each
of the list objects to be merged, and should return a reference to the merged list
object.
Exercise 15.8
Write a program that inserts 25 random integers from 0 to 100 in order in a
linked list object. The program should calculate the sum of the elements, and the
floating-point average of the elements.
Exercise 15.9
Write a program that creates a linked list object of 10 characters, then creates a
second list object containing a copy of the first list, but in reverse order.
Exercise 15.10
Write a program that inputs a line of text and uses a stack object to print the line
reversed.
Exercise 15.11
Write a program that uses a stack object to determine if a string is a palindrome
(i.e., the string is spelled identically backwards and forwards). The program
should ignore spaces and punctuation.
Exercise 15.12
Stacks are used by compilers to help in the process of evaluating expressions
and generating machine language code. In this and the next exercise, we
investigate how compilers evaluate arithmetic expressions consisting only of
constants, operators, and parentheses.
Humans generally write expressions like 3 + 4 and 7 / 9 in which the operator (+
or / here) is written between its operands--this is called infix notation.
Computers "prefer" postfix notation in which the operator is written to the right
of its two operands. The preceding infix expressions would appear in postfix
notation as 3 4 + and 7 9 /, respectively.
To evaluate a complex infix expression, a compiler would first convert the
expression to postfix notation, and then evaluate the postfix version of the
expression. Each of these algorithms requires only a single left-to-right pass of
the expression. Each algorithm uses a stack object in support of its operation,
and in each algorithm the stack is used for a different purpose.
In this exercise, you will write a C++ version of the infix-to-postfix conversion
algorithm. In the next exercise, you will write a C++ version of the postfix
expression evaluation algorithm. Later in the chapter, you will discover that
code you write in this exercise can help you implement a complete working
compiler.
Write a program that converts an ordinary infix arithmetic expression (assume a
valid expression is entered) with single digit integers such as
(6 + 2) * 5 - 8 / 4
to a postfix expression. The postfix version of the preceding infix expression is
6 2 + 5 * 8 4 / -
The program should read the expression into character array infix, and use
modified versions of the stack functions implemented in this chapter to help
create the postfix expression in character array postfix. The algorithm for
creating a postfix expression is as follows:
1. Push a left parenthesis '(' onto the stack.
2. Append a right parenthesis ')' to the end of infix.
While the stack is not empty, read infix from left to right and do the following:
If the current character in infix is a digit, copy it to the next element of postfix.
If the current character in infix is a left parenthesis, push it onto the stack.
If the current character in infix is an operator,
3. Pop operators (if there are any) at the top of the stack while they have equal or
higher precedence than the current operator, and insert the popped
operators in postfix.
Push the current character in infix onto the stack.
If the current character in infix is a right parenthesis
Pop operators from the top of the stack and insert them in postfix until a left
parenthesis is at the top of the stack.
Pop (and discard) the left parenthesis from the stack.
The following arithmetic operations are allowed in an expression:
+addition
-subtraction
*multiplication
/division
^exponentiation
%modulus
The stack should be maintained with stack nodes that each contain a data
member and a pointer to the next stack node.
Some of the functional capabilities you may want to provide are:
a) Function convertToPostfix that converts the infix expression to postfix
notation.
b) Function isOperator that determines if c is an operator.
c) Function precedence that determines if the precedence of operator1 is less
than, equal to, or greater than the precedence of operator2. The function returns
-1, 0, and 1, respectively.
d) Function push that pushes a value onto the stack.
e) Function pop that pops a value off the stack.
f) Function stackTop that returns the top value of the stack without popping the
stack.
g) Function isEmpty that determines if the stack is empty.
h) Function printStack that prints the stack.
Exercise 15.13
Write a program that evaluates a postfix expression (assume it is valid) such as
6 2 + 5 * 8 4 / -
The program should read a postfix expression consisting of digits and operators
into a character array. Using modified versions of the stack functions
implemented earlier in this chapter, the program should scan the expression and
evaluate it. The algorithm is as follows:
1. Append the null character ('\\0') to the end of the postfix expression. When
the null character is encountered, no further processing is necessary.
While '\\0' has not been encountered, read the expression from left to right.
If the current character is a digit,
Push its integer value onto the stack (the integer value of a digit character is
2. its
value in the computer's character set minus the value of ' 0' in the computer's
character set).
Otherwise, if the current character is an operator,
Pop the two top elements of the stack into variables x and y.
Calculate y operator x.
Push the result of the calculation onto the stack.
3. When the null character is encountered in the expression, pop the top value of
the stack. This is the result of the postfix expression.
Note: In 2) above, if the operator is '/', the top of the stack is 2, and the next
element in the stack is 8, then pop 2 into x, pop 8 into y, evaluate 8 / 2, and push
the result, 4, back onto the stack. This note also applies to operator '-'. The
arithmetic operations allowed in an expression are:
+addition
-subtraction
*multiplication
/division
^exponentiation
%modulus
The stack should be maintained with stack nodes that contain an int data
member and a pointer to the next stack node. You may want to provide the
following functional capabilities:
a) Function evaluatePostfixExpression that evaluates the postfix expression.
b) Function calculate that evaluates the expression op1 operator op2.
c) Function push that pushes a value onto the stack.
d) Function pop that pops a value off the stack.
e) Function isEmpty that determines if the stack is empty.
f) Function printStack that prints the stack.
Exercise 15.14
Modify the postfix evaluator program of
Exercise 15.13 so that it can process
integer operands larger than 9.
Exercise 15.15
(Supermarket simulation) Write a program that simulates a check-out line at a
supermarket. The line is a queue object. Customers (i.e., customer objects)
arrive in random integer intervals of 1 to 4 minutes. Also, each customer is
serviced in random integer intervals of 1 to 4 minutes. Obviously, the rates need
to be balanced. If the average arrival rate is larger than the average service rate,
the queue will grow infinitely. Even with "balanced" rates, randomness can still
cause long lines. Run the supermarket simulation for a 12-hour day (720
minutes) using the following algorithm:
1. Choose a random integer between 1 and 4 to determine the minute at which
the first customer arrives.
2. At the first customer's arrival time:
Determine customer's service time (random integer from 1 to 4);
Begin servicing the customer;
Schedule arrival time of next customer (random integer 1 to 4 added to the current time).
For each minute of the day:
If the next customer arrives,
Say so,
Enqueue the customer;
Schedule the arrival time of the next customer;
3. If service was completed for the last customer;
Say so
Dequeue next customer to be serviced
Determine customer's service completion time (random integer from 1 to 4
added to the current time).
Now run your simulation for 720 minutes and answer each of the following:
a) What is the maximum number of customers in the queue at any time?
b) What is the longest wait any one customer experiences?
c) What happens if the arrival interval is changed from 1-to-4 minutes to 1-to-3
minutes?
Exercise 15.16
Modify the program of
Fig. 15.16 to allow the binary tree object to contain
duplicates.
Exercise 15.17
Write a program based on the program of
Fig. 15.16 that inputs a line of text,
tokenizes the sentence into separate words (you may want to use the strtok
library function), inserts the words in a binary search tree, and prints the inorder,
preorder, and postorder traversals of the tree. Use an OOP approach.
Exercise 15.18
In this chapter, we saw that duplicate elimination is straightforward when
creating a binary search tree. Describe how you would perform duplicate
elimination using only a single-subscripted array. Compare the performance of
array-based duplicate elimination with the performance of binary-search-tree-
based duplicate elimination.
Exercise 15.19
Write a function depth that receives a binary tree and determines how many
levels it has.
Exercise 15.20
(Recursively print a list backwards) Write a member function
printListBackwards that recursively outputs the items in a linked list object in
reverse order. Write a test program that creates a sorted list of integers and prints
the list in reverse order.
Exercise 15.21
(Recursively search a list) Write a member function searchList that recursively
searches a linked list object for a specified value. The function should return a
pointer to the value if it is found; otherwise, null should be returned. Use your
function in a test program that creates a list of integers. The program should
prompt the user for a value to locate in the list.
Exercise 15.22
(Binary tree delete) In this exercise, we discuss deleting items from binary
search trees. The deletion algorithm is not as straightforward as the insertion
algorithm. There are three cases that are encountered when deleting an item--
the item is contained in a leaf node (i.e., it has no children), the item is contained
in a node that has one child, or the item is contained in a node that has two
children.
If the item to be deleted is contained in a leaf node, the node is deleted and the
pointer in the parent node is set to null.
If the item to be deleted is contained in a node with one child, the pointer in the
parent node is set to point to the child node and the node containing the data item
is deleted. This causes the child node to take the place of the deleted node in the
tree.
The last case is the most difficult. When a node with two children is deleted,
another node in the tree must take its place. However, the pointer in the parent
node cannot simply be assigned to point to one of the children of the node to be
deleted. In most cases, the resulting binary search tree would not adhere to the
following characteristic of binary search trees (with no duplicate values): The
values in any left subtree are less than the value in the parent node, and the
values in any right subtree are greater than the value in the parent node.
Which node is used as a replacement node to maintain this characteristic? Either
the node containing the largest value in the tree less than the value in the node
being deleted, or the node containing the smallest value in the tree greater than
the value in the node being deleted. Let us consider the node with the smaller
value. In a binary search tree, the largest value less than a parent's value is
located in the left subtree of the parent node and is guaranteed to be contained in
the rightmost node of the subtree. This node is located by walking down the left
subtree to the right until the pointer to the right child of the current node is null.
We are now pointing to the replacement node which is either a leaf node or a
node with one child to its left. If the replacement node is a leaf node, the steps to
perform the deletion are as follows:
1. Store the pointer to the node to be deleted in a temporary pointer variable (this
pointer is used to delete the dynamically allocated memory)
2. Set the pointer in the parent of the node being deleted to point to the replacement node
3. Set the pointer in the parent of the replacement node to null
4. Set the pointer to the right subtree in the replacement node to point to the right
subtree of the node to be deleted
5. Delete the node to which the temporary pointer variable points.
The deletion steps for a replacement node with a left child are similar to those
for a replacement node with no children, but the algorithm also must move the
child in to the replacement node's position in the tree. If the replacement node is
a node with a left child, the steps to perform the deletion are as follows:
1. Store the pointer to the node to be deleted in a temporary pointer variable
2. Set the pointer in the parent of the node being deleted to point to the replacement node
3. Set the pointer in the parent of the replacement node to point to the left child
of the replacement node
4. Set the pointer to the right subtree in the replacement node to point to the right
subtree of the node to be deleted
5. Delete the node to which the temporary pointer variable points.
Write member function deleteNode which takes as its arguments a pointer to the
root node of the tree object and the value to be deleted. The function should
locate in the tree the node containing the value to be deleted and use the
algorithms discussed here to delete the node. If the value is not found in the tree,
the function should print a message that indicates whether or not the value is
deleted. Modify the program of
Fig. 15.16 to use this function. After deleting an
item, call the inOrder, preOrder, and postOrder traversal functions to confirm
that the delete operation was performed correctly.
Exercise 15.23
(Binary tree search) Write member function binaryTreeSearch that attempts to
locate a specified value in a binary search tree object. The function should take
as arguments a pointer to the root node of the binary tree and a search key to be
located. If the node containing the search key is found, the function should
return a pointer to that node; otherwise, the function should return a null pointer.
Exercise 15.24
(Level-order binary tree traversal) The program of
Fig. 15.16 illustrated three
recursive methods of traversing a binary tree--inorder, preorder, and postorder
traversals. This exercise presents the level-order traversal of a binary tree in
which the node values are printed level-by-level starting at the root node level.
The nodes on each level are printed from left to right. The level-order traversal is
not a recursive algorithm. It uses a queue object to control the output of the
nodes. The algorithm is as follows:
1. Insert the root node in the queue
2. While there are nodes left in the queue,
Get the next node in the queue
Print the node's value
If the pointer to the left child of the node is not null
Insert the left child node in the queue
If the pointer to the right child of the node is not null
Insert the right child node in the queue.
Write member function levelOrder to perform a level-order traversal of a binary
tree object. Modify the program of
Fig 15.16 to use this function. (Note: You
will also need to modify and incorporate the queue processing functions of
Fig.
15.12 in this program.)
Exercise 15.25
(Printing trees) Write a recursive member function outputTree to display a
binary tree object on the screen. The function should output the tree row-by-row
with the top of the tree at the left of the screen and the bottom of the tree toward
the right of the screen. Each row is output vertically. For example, the binary
tree illustrated in
Fig. 15.19 is output as follows:

Note the rightmost leaf node appears at the top of the output in the rightmost
column and the root node appears at the left of the output. Each column of
output starts five spaces to the right of the previous column. Function
outputTree should receive an argument totalSpaces representing the number of
spaces preceding the value to be output (this variable should start at zero so the
root node is output at the left of the screen). The function uses a modified
inorder traversal to output the tree--it starts at the rightmost node in the tree and
works back to the left. The algorithm is as follows:
While the pointer to the current node is not null
Recursively call outputTree with the right subtree of the current node and
totalSpaces + 5
Use a for structure to count from 1 to totalSpaces and output spaces
Output the value in the current node
Set the pointer to the current node to point to the left subtree of the current node
Increment totalSpaces by 5.
Special Section: Building Your Own Compiler
In Exercises 5.18 and 5.19, we introduced Simpletron Machine Language
(SML) and you implemented a Simpletron computer simulator to execute
programs written in SML. In this section, we build a compiler that converts
programs written in a high-level programming language to SML. This section
"ties" together the entire programming process. You will write programs in this
new high-level language, compile these programs on the compiler you build,
and run the programs on the simulator you built in Exercise 7.19. You should
make every effort to implement your compiler in an object-oriented manner.
Exercise 15.26
(The Simple Language) Before we begin building the compiler, we discuss a
simple, yet powerful, high-level language similar to early versions of the
popular language BASIC. We call the language Simple. Every Simple statement
consists of a line number and a Simple instruction. Line numbers must appear in
ascending order. Each instruction begins with one of the following Simple
commands: rem, input, let, print, goto, if/goto, or end (see
Fig. 15.20). All
commands except end can be used repeatedly. Simple evaluates only integer
expressions using the +, -, *, and / operators. These operators have the same
precedence as in C. Parentheses can be used to change the order of evaluation of
an expression.
Our Simple compiler recognizes only lowercase letters. All characters in a
Simple file should be lowercase (uppercase letters result in a syntax error unless
they appear in a rem statement in which case they are ignored). A variable name
is a single letter. Simple does not allow descriptive variable names, so variables
should be explained in remarks to indicate their use in a program. Simple uses
only integer variables. Simple does not have variable declarations--merely
mentioning a variable name in a program causes the variable to be declared and
initialized to zero automatically. The syntax of Simple does not allow string
manipulation (reading a string, writing a string, comparing strings, etc.). If a
string is encountered in a Simple program (after a command other than rem), the
compiler generates a syntax error. The first version of our compiler will assume
that Simple programs are entered correctly.
Exercise 15.29 asks the student to
modify the compiler to perform syntax error checking.
Simple uses the conditional if/goto statement and the unconditional goto
statement to alter the flow of control during program execution. If the condition
in the if/goto statement is true, control is transferred to a specific line of the
program. The following relational and equality operators are valid in an if/goto
statement: <, >, <=, >=, ==, or !=. The precedence of these operators is the same
as in C++.
Let us now consider several programs that demonstrate Simple's features. The
first program (
Fig. 15.21) reads two integers from the keyboard, stores the
values in variables a and b, and computes and prints their sum (stored in variable
c).
The program of
Fig. 15.22 determines and prints the larger of two integers. The
integers are input from the keyboard and stored in s and t. The if/goto statement
tests the condition s >= t. If the condition is true, control is transferred to line 90
and s is output; otherwise, t is output and control is transferred to the end
statement in line 99 where the program terminates.
Simple does not provide a repetition structure (such as C++'s for, while, or do/
while). However, Simple can simulate each of C++'s repetition structures using
the if/goto and goto statements.
Figure 15.23 uses a sentinel-controlled loop to
calculate the squares of several integers. Each integer is input from the keyboard
and stored in variable j. If the value entered is the sentinel -9999, control is
transferred to line 99 where the program terminates. Otherwise, k is assigned the
square of j, k is output to the screen, and control is passed to line 20 where the
next integer is input.
Using the sample programs of
Fig. 15.21,
Fig. 15.22, and
Fig. 15.23 as your
guide, write a Simple program to accomplish each of the following:
a) Input three integers, determine their average, and print the result.
b) Use a sentinel-controlled loop to input 10 integers and compute and print
their sum.
c) Use a counter-controlled loop to input 7 integers, some positive and some
negative, and compute and print their average.
d) Input a series of integers and determine and print the largest. The first integer
input indicates how many numbers should be processed.
e) Input 10 integers and print the smallest.
f) Calculate and print the sum of the even integers from 2 to 30.
g) Calculate and print the product of the odd integers from 1 to 9.
Exercise 15.27
(Building A Compiler; Prerequisite: Complete Exercises 5.18, 5.19,
15.12,
15.13, and
15.26) Now that the Simple language has been presented (
Exercise
15.26), we discuss how to build a Simple compiler. First, we consider the
process by which a Simple program is converted to SML and executed by the
Simpletron simulator (see
Fig. 15.24). A file containing a Simple program is
read by the compiler and converted to SML code. The SML code is output to a
file on disk, in which SML instructions appear one per line. The SML file is then
loaded into the Simpletron simulator, and the results are sent to a file on disk and
to the screen. Note that the Simpletron program developed in Exercise 5.19 took
its input from the keyboard. It must be modified to read from a file so it can run
the programs produced by our compiler.
The Simple compiler performs two passes of the Simple program to convert it to
SML. The first pass constructs a symbol table (object) in which every line
number (object), variable name (object) and constant (object) of the Simple
program is stored with its type and corresponding location in the final SML code
(the symbol table is discussed in detail below). The first pass also produces the
corresponding SML instruction object(s) for each of the Simple statements
(object, etc.). As we will see, if the Simple program contains statements that
transfer control to a line later in the program, the first pass results in an SML
program containing some "unfinished" instructions. The second pass of the
compiler locates and completes the unfinished instructions, and outputs the SML
program to a file.
First Pass
The compiler begins by reading one statement of the Simple program into
memory. The line must be separated into its individual tokens (i.e., "pieces" of a
statement) for processing and compilation (standard library function strtok can
be used to facilitate this task). Recall that every statement begins with a line
number followed by a command. As the compiler breaks a statement into
tokens, if the token is a line number, a variable, or a constant, it is placed in the
symbol table. A line number is placed in the symbol table only if it is the first
token in a statement. The symbolTable object is an array of tableEntry objects
representing each symbol in the program. There is no restriction on the number
of symbols that can appear in the program. Therefore, the symbolTable for a
particular program could be large. Make the symbolTable a 100-element array
for now. You can increase or decrease its size once the program is working.
Each tableEntry object contains three members. Member symbol is an integer
containing the ASCII representation of a variable (remember that variable
names are single characters), a line number, or a constant. Member type is one
of the following characters indicating the symbol's type: 'C' for constant, 'L'
for line number, or 'V' for variable. Member location contains the Simpletron
memory location (00 to 99) to which the symbol refers. Simpletron memory is
an array of 100 integers in which SML instructions and data are stored. For a
line number, the location is the element in the Simpletron memory array at
which the SML instructions for the Simple statement begin. For a variable or
constant, the location is the element in the Simpletron memory array in which
the variable or constant is stored. Variables and constants are allocated from the
end of Simpletron's memory backwards. The first variable or constant is stored
in location at 99, the next in location at 98, etc.
The symbol table plays an integral part in converting Simple programs to SML.
We learned in Chapter 5 that an SML instruction is a four-digit integer
comprised of two parts--the operation code and the operand. The operation
code is determined by commands in Simple. For example, the simple command
input corresponds to SML operation code 10 (read), and the Simple command
print corresponds to SML operation code 11 (write). The operand is a memory
location containing the data on which the operation code performs its task (e.g.,
operation code 10 reads a value from the keyboard and stores it in the memory
location specified by the operand). The compiler searches symbolTable to
determine the Simpletron memory location for each symbol so the
corresponding location can be used to complete the SML instructions.
The compilation of each Simple statement is based on its command. For
example, after the line number in a rem statement is inserted in the symbol
table, the remainder of the statement is ignored by the compiler because a
remark is for documentation purposes only. The input, print, goto and end
statements correspond to the SML read, write, branch (to a specific location)
and halt instructions. Statements containing these Simple commands are
converted directly to SML (note that a goto statement may contain an
unresolved reference if the specified line number refers to a statement further
into the Simple program file; this is sometimes called a forward reference).
When a goto statement is compiled with an unresolved reference, the SML
instruction must be flagged to indicate that the second pass of the compiler must
complete the instruction. The flags are stored in 100-element array flags of type
int in which each element is initialized to -1. If the memory location to which a
line number in the Simple program refers is not yet known (i.e., it is not in the
symbol table), the line number is stored in array flags in the element with the
same subscript as the incomplete instruction. The operand of the incomplete
instruction is set to 00 temporarily. For example, an unconditional branch
instruction (making a forward reference) is left as +4000 until the second pass of
the compiler. The second pass of the compiler will be described shortly.
Compilation of if/goto and let statements is more complicated than other
statements--they are the only statements that produce more than one SML
instruction. For an if/goto statement, the compiler produces code to test the
condition and to branch to another line if necessary. The result of the branch
could be an unresolved reference. Each of the relational and equality operators
can be simulated using SML's branch zero and branch negative instructions (or
possibly a combination of both).
For a let statement, the compiler produces code to evaluate an arbitrarily
complex arithmetic expression consisting of integer variables and/or constants.
Expressions should separate each operand and operator with spaces.
Exercises
15.12 and
15.13 presented the infix-to-postfix conversion algorithm and the
postfix evaluation algorithm used by compilers to evaluate expressions. Before
proceeding with your compiler, you should complete each of these exercises.
When a compiler encounters an expression, it converts the expression from infix
notation to postfix notation, then evaluates the postfix expression.
How is it that the compiler produces the machine language to evaluate an
expression containing variables? The postfix evaluation algorithm contains a
"hook" where the compiler can generate SML instructions rather than actually
evaluating the expression. To enable this "hook" in the compiler, the postfix
evaluation algorithm must be modified to search the symbol table for each
symbol it encounters (and possibly insert it), determine the symbol's
corresponding memory location, and push the memory location onto the stack
(instead of the symbol). When an operator is encountered in the postfix
expression, the two memory locations at the top of the stack are popped and
machine language for effecting the operation is produced using the memory
locations as operands. The result of each subexpression is stored in a temporary
location in memory and pushed back onto the stack so the evaluation of the
postfix expression can continue. When postfix evaluation is complete, the
memory location containing the result is the only location left on the stack. This
is popped and SML instructions are generated to assign the result to the variable
at the left of the let statement.
Second Pass
The second pass of the compiler performs two tasks: resolve any unresolved
references and output the SML code to a file. Resolution of references occurs as
follows:
a) Search the flags array for an unresolved reference (i.e., an element with a
value other than -1).
b) Locate the object in array symbolTable containing the symbol stored in the
flags array (be sure that the type of the symbol is 'L' for line number).
c) Insert the memory location from member location into the instruction with
the unresolved reference (remember that an instruction containing an unresolved
reference has operand 00).
d) Repeat steps 1, 2, and 3 until the end of the flags array is reached.
After the resolution process is complete, the entire array containing the SML
code is output to a disk file with one SML instruction per line. This file can be
read by the Simpletron for execution (after the simulator is modified to read its
input from a file). Compiling your first Simple program into an SML file and
then executing that file should give you a real sense of personal
accomplishment.
A Complete Example
The following example illustrates a complete conversion of a Simple program to
SML as it will be performed by the Simple compiler. Consider a Simple
program that inputs an integer and sums the values from 1 to that integer. The
program and the SML instructions produced by the first pass of the Simple
compiler are illustrated in
Fig. 15.25. The symbol table constructed by the first
pass is shown in
Fig. 15.26.
Most Simple statements convert directly to single SML instructions. The
exceptions in this program are remarks, the if/goto statement in line 20, and the
let statements. Remarks do not translate into machine language. However, the
line number for a remark is placed in the symbol table in case the line number is
referenced in a goto statement or an if/goto statement. Line 20 of the program
specifies that if the condition y == x is true, program control is transferred to line
60. Because line 60 appears later in the program, the first pass of the compiler
has not as yet placed 60 in the symbol table (statement line numbers are placed
in the symbol table only when they appear as the first token in a statement).
Therefore, it is not possible at this time to determine the operand of the SML
branch zero instruction at location 03 in the array of SML instructions. The
compiler places 60 in location 03 of the flags array to indicate that the second
pass completes this instruction.
We must keep track of the next instruction location in the SML array because
there is not a one-to-one correspondence between Simple statements and SML
instructions. For example, the if/goto statement of line 20 compiles into three
SML instructions. Each time an instruction is produced, we must increment the
instruction counter to the next location in the SML array. Note that the size of
Simpletron's memory could present a problem for Simple programs with many
statements, variables and constants. It is conceivable that the compiler will run
out of memory. To test for this case, your program should contain a data counter
to keep track of the location at which the next variable or constant will be stored
in the SML array. If the value of the instruction counter is larger than the value
of the data counter, the SML array is full. In this case, the compilation process
should terminate and the compiler should print an error message indicating that
it ran out of memory during compilation. This serves to emphasize that although
the programmer is freed from the burdens of managing memory by the compiler,
the compiler itself must carefully determine the placement of instructions and
data in memory, and must check for such errors as memory being exhausted
during the compilation process.
A Step-by-Step View of the Compilation Process
Let us now walk through the compilation process for the Simple program in
Fig.
15.25. The compiler reads the first line of the program
5 rem sum 1 to x
into memory. The first token in the statement (the line number) is determined
using strtok (see Chapters 5 and 16 for a discussion of C++'s string manipulation
functions). The token returned by strtok is converted to an integer using atoi so
the symbol 5 can be located in the symbol table. If the symbol is not found, it is
inserted in the symbol table. Since we are at the beginning of the program and
this is the first line, no symbols are in the table yet. So, 5 is inserted into the
symbol table as type L (line number) and assigned the first location in SML
array (00). Although this line is a remark, a space in the symbol table is still
allocated for the line number (in case it is referenced by a goto or an if/goto). No
SML instruction is generated for a rem statement, so the instruction counter is
not incremented.
The statement
10 input x
is tokenized next. The line number 10 is placed in the symbol table as type L and
assigned the first location in the SML array (00 because a remark began the
program so the instruction counter is currently 00). The command input
indicates that the next token is a variable (only a variable can appear in an input
statement). Because input corresponds directly to an SML operation code, the
compiler simply has to determine the location of x in the SML array. Symbol x is
not found in the symbol table. So, it is inserted into the symbol table as the
ASCII representation of x, given type V, and assigned location 99 in the SML
array (data storage begins at 99 and is allocated backwards). SML code can now
be generated for this statement. Operation code 10 (the SML read operation
code) is multiplied by 100, and the location of x (as determined in the symbol
table) is added to complete the instruction. The instruction is then stored in the
SML array at location 00. The instruction counter is incremented by 1 because a
single SML instruction was produced.
The statement
15 rem check y == x
is tokenized next. The symbol table is searched for line number 15 (which is not
found). The line number is inserted as type L and assigned the next location in
the array, 01 (remember that rem statements do not produce code, so the
instruction counter is not incremented).
The statement
20 if y == x goto 60
is tokenized next. Line number 20 is inserted in the symbol table and given type
L with the next location in the SML array 01. The command if indicates that a
condition is to be evaluated. The variable y is not found in the symbol table, so it
is inserted and given the type V and the SML location 98. Next, SML
instructions are generated to evaluate the condition. Since there is no direct
equivalent in SML for the if/goto, it must be simulated by performing a
calculation using x and y and branching based on the result. If y is equal to x, the
result of subtracting x from y is zero, so the branch zero instruction can be used
with the result of the calculation to simulate the if/goto statement. The first step
requires that y be loaded (from SML location 98) into the accumulator. This
produces the instruction 01 +2098. Next, x is subtracted from the accumulator.
This produces the instruction 02 +3199. The value in the accumulator may be
zero, positive, or negative. Since the operator is ==, we want to branch zero.
First, the symbol table is searched for the branch location (60 in this case),
which is not found. So, 60 is placed in the flags array at location 03, and the
instruction 03 +4200 is generated (we cannot add the branch location because
we have not assigned a location to line 60 in the SML array yet). The instruction
counter is incremented to 04.
The compiler proceeds to the statement
25 rem increment y
The line number 25 is inserted in the symbol table as type L and assigned SML
location 04. The instruction counter is not incremented.
When the statement
30 let y = y + 1
is tokenized, the line number 30 is inserted in the symbol table as type L and
assigned SML location 04. Command let indicates that the line is an assignment
statement. First, all the symbols on the line are inserted in the symbol table (if
they are not already there). The integer 1 is added to the symbol table as type C
and assigned SML location 97. Next, the right side of the assignment is
converted from infix to postfix notation. Then the postfix expression (y 1 +) is
evaluated. Symbol y is located in the symbol table and its corresponding
memory location is pushed onto the stack. Symbol 1 is also located in the
symbol table and its corresponding memory location is pushed onto the stack.
When the operator + is encountered, the postfix evaluator pops the stack into the
right operand of the operator and pops the stack again into the left operand of the
operator, then produces the SML instructions
04 +2098 (load y)
05 +3097 (add 1)
The result of the expression is stored in a temporary location in memory (96)
with instruction
06 +2196 (store temporary)
and the temporary location is pushed on the stack. Now that the expression has
been evaluated, the result must be stored in y (i.e., the variable on the left side of
=). So, the temporary location is loaded into the accumulator and the
accumulator is stored in y with the instructions
07 +2096 (load temporary)
08 +2198 (store y)
The reader will immediately notice that SML instructions appear to be
redundant. We will discuss this issue shortly.
When the statement
35 rem add y to total
is tokenized, line number 35 is inserted in the symbol table as type L and
assigned location 09.
The statement
40 let t = t + y
is similar to line 30. The variable t is inserted in the symbol table as type V and
assigned SML location 95. The instructions follow the same logic and format as
line 30, and the instructions 09 +2095, 10 +3098, 11 +2194, 12 +2094, and 13
+2195 are generated. Note that the result of t + y is assigned to temporary
location 94 before being assigned to t (95). Once again, the reader will note that
the instructions in memory locations 11 and 12 appear to be redundant. Again,
we will discuss this shortly.
The statement
45 rem loop y
is a remark, so line 45 is added to the symbol table as type L and assigned SML
location 14.
The statement
50 goto 20
transfers control to line 20. Line number 50 is inserted in the symbol table as
type L and assigned SML location 14. The equivalent of goto in SML is the
unconditional branch (40) instruction that transfers control to a specific SML
location. The compiler searches the symbol table for line 20 and finds that it
corresponds to SML location 01. The operation code (40) is multiplied by 100
and location 01 is added to it to produce the instruction 14 +4001.
The statement
55 rem output result
is a remark, so line 55 is inserted in the symbol table as type L and assigned
SML location 15.
The statement
60 print t
is an output statement. Line number 60 is inserted in the symbol table as type L
and assigned SML location 15. The equivalent of print in SML is operation
code 11 (write). The location of t is determined from the symbol table and added
to the result of the operation code multiplied by 100.
The statement
99 end
is the final line of the program. Line number 99 is stored in the symbol table as
type L and assigned SML location 16. The end command produces the SML
instruction +4300 (43 is halt in SML) which is written as the final instruction in
the SML memory array.
This completes the first pass of the compiler. We now consider the second pass.
The flags array is searched for values other than -1. Location 03 contains 60, so
the compiler knows that instruction 03 is incomplete. The compiler completes
the instruction by searching the symbol table for 60, determining its location,
and adding the location to the incomplete instruction. In this case, the search
determines that line 60 corresponds to SML location 15, so the completed
instruction 03 +4215 is produced replacing 03 +4200. The Simple program has
now been compiled successfully.
To build the compiler, you will have to perform each of the following tasks:
a) Modify the Simpletron simulator program you wrote in Exercise 5.19 to take
its input from a file specified by the user (see Chapter 14). The simulator should
output its results to a disk file in the same format as the screen output. Convert
the simulator to be an object-oriented program. In particular, make each part of
the hardware an object. Arrange the instruction types into a class hierarchy using
inheritance. Then execute the program polymorphically simply by telling each
instruction to execute itself with an executeInstruction message.
b) Modify the infix-to-postfix conversion algorithm of
Exercise 15.12 to
process multi-digit integer operands and single-letter variable name operands.
Hint: Standard library function strtok can be used to locate each constant and
variable in an expression, and constants can be converted from strings to
integers using standard library function atoi. (Note: The data representation of
the postfix expression must be altered to support variable names and integer
constants.)
c) Modify the postfix evaluation algorithm to process multi-digit integer
operands and variable name operands. Also, the algorithm should now
implement the "hook" discussed above so that SML instructions are produced
rather than directly evaluating the expression. Hint: Standard library function
strtok can be used to locate each constant and variable in an expression, and
constants can be converted from strings to integers using standard library
function atoi. (Note: The data representation of the postfix expression must be
altered to support variable names and integer constants.)
d) Build the compiler. Incorporate parts (b) and (c) for evaluating expressions in
let statements. Your program should contain a function that performs the first
pass of the compiler and a function that performs the second pass of the
compiler. Both functions can call other functions to accomplish their tasks.
Make your compiler as object oriented as possible.
e) Build the compiler. Incorporate parts (b) and (c) for evaluating expressions in
let statements. Your program should contain a function that performs the first
pass of the compiler and a function that performs the second pass of the
compiler. Both functions can call other functions to accomplish their tasks.
Make your compiler as object oriented as possible.
Exercise 15.28
(Optimizing the Simple Compiler) When a program is compiled and converted
into SML, a set of instructions is generated. Certain combinations of instructions
often repeat themselves, usually in triplets called productions. A production
normally consists of three instructions such as load, add, and store. For
example,
Fig. 15.27 illustrates five of the SML instructions that were produced
in the compilation of the program in
Fig. 15.25 The first three instructions are
the production that adds 1 to y. Note that instructions 06 and 07 store the
accumulator value in temporary location 96, then load the value back into the
accumulator so instruction 08 can store the value in location 98. Often a
production is followed by a load instruction for the same location that was just
stored. This code can be optimized by eliminating the store instruction and the
subsequent load instruction that operate on the same memory location, thus
enabling the Simpletron to execute the program faster.
Figure 15.28 illustrates
the optimized SML for the program of
Fig. 15.25. Note that there are four fewer
instructions in the optimized code--a memory-space savings of 25%.
Modify the compiler to provide an option for optimizing the Simpletron
Machine Language code it produces. Manually compare the non-optimized code
with the optimized code, and calculate the percentage reduction.
Exercise 15.29
(Modifications to the Simple compiler) Perform the following modifications to
the Simple compiler. Some of these modifications may also require
modifications to the Simpletron Simulator program written in Exercise 5.19.
a) Allow the modulus operator (%) to be used in let statements. Simpletron
Machine Language must be modified to include a modulus instruction.
b) Allow exponentiation in a let statement using ^ as the exponentiation
operator. Simpletron Machine Language must be modified to include an
exponentiation instruction.
c) Allow the compiler to recognize uppercase and lowercase letters in Simple
statements (e.g., 'A' is equivalent to 'a'). No modifications to the Simpletron
Simulator are required.
d) Allow input statements to read values for multiple variables such as input x,
y. No modifications to the Simpletron Simulator are required.
e) Allow the compiler to output multiple values in a single print statement such
as print a, b, c. No modifications to the Simpletron Simulator are required.
f) Add syntax-checking capabilities to the compiler so error messages are
output when syntax errors are encountered in a Simple program. No
modifications to the Simpletron Simulator are required.
g) Allow arrays of integers. No modifications to the Simpletron Simulator are
required.
h) Allow subroutines specified by the Simple commands gosub and return.
Command gosub passes program control to a subroutine and command return
passes control back to the statement after the gosub. This is similar to a function
call in C++. The same subroutine can be called from many gosub commands
distributed throughout a program. No modifications to the Simpletron Simulator
are required.
i) Allow repetition structures of the form
for x = 2 to 10 step 2
Simple statements
next
This for statement loops from 2 to 10 with an increment of 2. The next line
marks the end of the body of the for line. No modifications to the Simpletron
Simulator are required.
j) Allow repetition structures of the form
for x = 2 to 10
Simple statements
next
This for statement loops from 2 to 10 with a default increment of 1. No
modifications to the Simpletron Simulator are required.
k) Allow the compiler to process string input and output. This requires the
Simpletron Simulator to be modified to process and store string values. Hint:
Each Simpletron word can be divided into two groups, each holding a two-digit
integer. Each two-digit integer represents the ASCII decimal equivalent of a
character. Add a machine language instruction that will print a string beginning
at a certain Simpletron memory location. The first half of the word at that
location is a count of the number of characters in the string (i.e., the length of the
string). Each succeeding half word contains one ASCII character expressed as
two decimal digits. The machine language instruction checks the length and
prints the string by translating each two-digit number into its equivalent
character.
l) Allow the compiler to process floating-point values in addition to integers.
The Simpletron Simulator must also be modified to process floating-point
values.
Exercise 15.30
(A Simple interpreter) An interpreter is a program that reads a high-level
language program statement, determines the operation to be performed by the
statement, and executes the operation immediately. The high-level language
program is not converted into machine language first. Interpreters execute
slowly because each statement encountered in the program must first be
deciphered. If statements are contained in a loop, the statements are deciphered
each time they are encountered in the loop. Early versions of the BASIC
programming language were implemented as interpreters.
Write an interpreter for the Simple language discussed in
Exercise 15.26. The
program should use the infix-to-postfix converter developed in
Exercise 15.12
and the postfix evaluator developed in
Exercise 15.13 to evaluate expressions in
a let statement. The same restrictions placed on the Simple language in
Exercise
15.26 should be adhered to in this program. Test the interpreter with the Simple
programs written in
Exercise 15.26. Compare the results of running these
programs in the interpreter with the results of compiling the Simple programs
and running them in the Simpletron Simulator built in Exercise 5.19.
Exercise 15.31
(Insert/Delete Anywhere in a Linked List) Our linked list class template allowed
insertions and deletions at only the front and the back of the linked list. These
capabilities were convenient for us when we used private inheritance and
composition to produce a stack class template and a queue class template with a
minimal amount of code simply by reusing the list class template. Actually
linked lists are more general that those we provided. Modify the linked list class
template we developed in this chapter to handle insertions and deletions
anywhere in the list.
Exercise 15.32
(List and Queues without Tail Pointers) Our implementation of a linked list (
Fig. 15.3) used both a firstPtr and a lastPtr. The lastPtr was useful for the
insertAtBack and removeFromBack member functions of the List class. The
insertAtBack function corresponds to the enqueue member function of the
Queue class. Rewrite the List class so that it does not use a lastPtr. Thus, any
operations on the tail of a list must begin searching the list from the front. Does
this affect our implementation of the Queue class (
Fig. 15.12)?
Exercise 15.33
Use the composition version of the stack program (
Fig. 15.11) to form a
complete working stack program. Modify this program to inline the member
functions. Compare the two approaches. Summarize the advantages and
disadvantages of inlining member functions.
Exercise 15.34
(Performance of Binary Tree Sorting and Searching) One problem with the
binary tree sort is that the order in which the data is inserted affects the shape of
the tree--for the same collection of data, different orderings can yield binary
trees of dramatically different shapes. The performance of the binary tree sorting
and searching algorithms is sensitive to the shape of the binary tree. What shape
would a binary tree have if its data were inserted in increasing order? in
decreasing order? What shape should the tree have to achieve maximal
searching performance?
Exercise 15.35
(Indexed Lists) As presented in the text, linked lists must be searched
sequentially. For large lists, this can result in poor performance. A common
technique for improving list searching performance is to create and maintain an
index to the list. An index is a set of pointers to various key places in the list. For
example, an application that searches a large list of names could improve
performance by creating an index with 26 entries--one for each letter of the
alphabet. A search operation for a last name beginning with 'Y' would then first
search the index to determine where the 'Y' entries begin, and then "jump into"
the list at that point and search linearly until the desired name is found. This
would be much faster than searching the linked list from the beginning. Use the
List class of
Fig. 15.3 as the basis of an IndexedList class. Write a program that
demonstrates the operation of indexed lists. Be sure to include member functions
insertInIndexedList, searchIndexedList, and deleteFromIndexedList.